Slide 2 - Why did we use R?
Topic:
A bone marrow transplant is the replacement of damaged blood cells for healthy stems cells.
Methodology:
R programming language is a powerful tool that allowed us to clean, visualize, organize, and manipulate the data. Therefore, we were able to understand the correlation of the different varibales affecting to the survival rate after the transplant.
Data Wrangling
Dataset: Bone marrow transplant: Children (Donated 20/04/2020)
File: bone-marrow.ariff
An Attribute-Relation File format, .ariff,
Contain: two destinct sections: Header & Data
With comments using: “%”.
Contains Metadata
Name of Relation
List of Attributes
|
|
Steps:
- Download file into newly created data folders
- Extract metadata & data into two files: a .txt and a .tsv containing tidy data: metadata.txt.gz & data.tsv.gz
- Augment columns & binary values
Recipientage, Disease type, Rbodymass index VS Survival rate
The boxplot plot compares the age distribution of the recipients among the different diseases types, while dividing them by their survival status: alive and dead.
Relevant outcomes:
Patients with lymphoma had a survial rate of 0%.
Age might be associated with mortality, since older patients have higher mortality than young ones.
Every disease type exhibited higher mortality than survival.
The bar plot reveals the relationship between the BMI with survival rate. It showed that survival decreases while BMI increases. Underweight patients had the highest amount of survivors, meanwhile, the obese group was the only category which mortality exceeded survival.